indicator variable
Deep classifier kriging for probabilistic spatial prediction of air quality index
Chen, Junyu, Nag, Pratik, Judy-Wang, Huixia, Sun, Ying
Accurate spatial interpolation of the air quality index (AQI), computed from concentrations of multiple air pollutants, is essential for regulatory decision-making, yet AQI fields are inherently non-Gaussian and often exhibit complex nonlinear spatial structure. Classical spatial prediction methods such as kriging are linear and rely on Gaussian assumptions, which limits their ability to capture these features and to provide reliable predictive distributions. In this study, we propose \textit{deep classifier kriging} (DCK), a flexible, distribution-free deep learning framework for estimating full predictive distribution functions for univariate and bivariate spatial processes, together with a \textit{data fusion} mechanism that enables modeling of non-collocated bivariate processes and integration of heterogeneous air pollution data sources. Through extensive simulation experiments, we show that DCK consistently outperforms conventional approaches in predictive accuracy and uncertainty quantification. We further apply DCK to probabilistic spatial prediction of AQI by fusing sparse but high-quality station observations with spatially continuous yet biased auxiliary model outputs, yielding spatially resolved predictive distributions that support downstream tasks such as exceedance and extreme-event probability estimation for regulatory risk assessment and policy formulation.
- North America > United States > California (0.05)
- North America > United States > Louisiana (0.04)
- Oceania > Australia > New South Wales > Wollongong (0.04)
- (9 more...)
- Law (1.00)
- Health & Medicine (0.93)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Security & Privacy (0.34)
- Information Technology > Modeling & Simulation (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Interpretable Generalized Additive Models for Datasets with Missing Values
Many important datasets contain samples that are missing one or more feature values. Maintaining the interpretability of machine learning models in the presence of such missing data is challenging. Singly or multiply imputing missing values complicates the model's mapping from features to labels. On the other hand, reasoning on indicator variables that represent missingness introduces a potentially large number of additional terms, sacrificing sparsity. We solve these problems with M-GAM, a sparse, generalized, additive modeling approach that incorporates missingness indicators and their interaction terms while maintaining sparsity through $\ell_0$ regularization. We show that M-GAM provides similar or superior accuracy to prior methods while significantly improving sparsity relative to either imputation or naïve inclusion of indicator variables.
The Infinite Mixture of Infinite Gaussian Mixtures
Halid Z. Yerebakan, Bartek Rajwa, Murat Dundar
Dirichlet process mixture of Gaussians (DPMG) has been used in the literature for clustering and density estimation problems. However, many real-world data exhibit cluster distributions that cannot be captured by a single Gaussian. Modeling such data sets by DPMG creates several extraneous clusters even when clusters are relatively well-defined.
- North America > United States > Indiana > Marion County > Indianapolis (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Natick (0.04)
- (2 more...)
Interpretable Generalized Additive Models for Datasets with Missing Values
Many important datasets contain samples that are missing one or more feature values. Maintaining the interpretability of machine learning models in the presence of such missing data is challenging. Singly or multiply imputing missing values complicates the model's mapping from features to labels. On the other hand, reasoning on indicator variables that represent missingness introduces a potentially large number of additional terms, sacrificing sparsity. We solve these problems with M-GAM, a sparse, generalized, additive modeling approach that incorporates missingness indicators and their interaction terms while maintaining sparsity through \ell_0 regularization.
The Infinite Mixture of Infinite Gaussian Mixtures
Halid Z. Yerebakan, Bartek Rajwa, Murat Dundar
Dirichlet process mixture of Gaussians (DPMG) has been used in the literature for clustering and density estimation problems. However, many real-world data exhibit cluster distributions that cannot be captured by a single Gaussian. Modeling such data sets by DPMG creates several extraneous clusters even when clusters are relatively well-defined.
- North America > United States > Indiana > Marion County > Indianapolis (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Natick (0.04)
- (2 more...)
Towards Probabilistic Planning of Explanations for Robot Navigation
Halilovic, Amar, Krivic, Senka
In robotics, ensuring that autonomous systems are comprehensible and accountable to users is essential for effective human-robot interaction. This paper introduces a novel approach that integrates user-centered design principles directly into the core of robot path planning processes. We propose a probabilistic framework for automated planning of explanations for robot navigation, where the preferences of different users regarding explanations are probabilistically modeled to tailor the stochasticity of the real-world human-robot interaction and the communication of decisions of the robot and its actions towards humans. This approach aims to enhance the transparency of robot path planning and adapt to diverse user explanation needs by anticipating the types of explanations that will satisfy individual users.
- Europe > Bosnia and Herzegovina > Federation of Bosnia and Herzegovina > Sarajevo Canton > Sarajevo (0.05)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Europe > Germany (0.04)
The Infinite Mixture of Infinite Gaussian Mixtures
Dirichlet process mixture of Gaussians (DPMG) has been used in the literature for clustering and density estimation problems. However, many real-world data exhibit cluster distributions that cannot be captured by a single Gaussian. Modeling such data sets by DPMG creates several extraneous clusters even when clusters are relatively well-defined.
- North America > United States > Indiana > Marion County > Indianapolis (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Natick (0.04)
- (2 more...)
Predicting Failure of P2P Lending Platforms through Machine Learning: The Case in China
Yeh, Jen-Yin, Chiu, Hsin-Yu, Huang, Jhih-Huei
This study employs machine learning models to predict the failure of Peer-to-Peer (P2P) lending platforms, specifically in China. By employing the filter method and wrapper method with forward selection and backward elimination, we establish a rigorous and practical procedure that ensures the robustness and importance of variables in predicting platform failures. The research identifies a set of robust variables that consistently appear in the feature subsets across different selection methods and models, suggesting their reliability and relevance in predicting platform failures. The study highlights that reducing the number of variables in the feature subset leads to an increase in the false acceptance rate while the performance metrics remain stable, with an AUC value of approximately 0.96 and an F1 score of around 0.88. The findings of this research provide significant practical implications for regulatory authorities and investors operating in the Chinese P2P lending industry.
- Information Technology > Services > e-Commerce Services (1.00)
- Banking & Finance > Loans (1.00)
Data Interpretation over Plots
Methani, Nitesh, Ganguly, Pritha, Khapra, Mitesh M., Kumar, Pratyush
Reasoning over plots by question answering (QA) is a challenging machine learning task at the intersection of vision, language processing, and reasoning. Existing synthetic datasets (FigureQA, DVQA) do not model variability in data labels, real-valued data, or complex reasoning questions. Consequently, proposed models for these datasets do not fully address the challenge of reasoning over plots. We propose PlotQA with 8.1 million question-answer pairs over 220,000 plots with data from real-world sources and questions based on crowd-sourced question templates. 26% of the questions in PlotQA have answers that are not in a fixed vocabulary, requiring reasoning capabilities. Analysis of existing models on PlotQA reveals that a hybrid model is required: Specific questions are answered better by choosing the answer from a fixed vocabulary or by extracting it from a predicted bounding box in the plot, while other questions are answered with a table question-answering engine which is fed with a structured table extracted by visual element detection. For the latter, we propose the VOES pipeline and combine it with SAN-VQA to form a hybrid model SAN-VOES. On the DVQA dataset, SAN-VOES model has an accuracy of 58%, significantly improving on highest reported accuracy of 46%. On the PlotQA dataset, SAN-VOES has an accuracy of 54%, which is the highest amongst all the models we trained. Analysis of each module in the VOES pipeline reveals that further improvement in accuracy requires more accurate visual element detection.
- North America > United States > Indiana > Boone County > Lebanon (0.04)
- Asia > Middle East > Lebanon (0.04)
- South America > Brazil (0.04)
- (13 more...)
Bayesian Nonparametrics for Non-exhaustive Learning
Cheng, Yicheng, Rajwa, Bartek, Dundar, Murat
Non-exhaustive learning (NEL) is an emerging machine-learning paradigm designed to confront the challenge of non-stationary environments characterized by anon-exhaustive training sets lacking full information about the available classes.Unlike traditional supervised learning that relies on fixed models, NEL utilizes self-adjusting machine learning to better accommodate the non-stationary nature of the real-world problem, which is at the root of many recently discovered limitations of deep learning. Some of these hurdles led to a surge of interest in several research areas relevant to NEL such as open set classification or zero-shot learning. The presented study which has been motivated by two important applications proposes a NEL algorithm built on a highly flexible, doubly non-parametric Bayesian Gaussian mixture model that can grow arbitrarily large in terms of the number of classes and their components. We report several experiments that demonstrate the promising performance of the introduced model for NEL.